Crosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation
نویسندگان
چکیده
Recent studies have shown that speech recognizers may benefit from data in languages other than the target language through efficient acoustic modelor feature-level adaptation. Crosslingual Tandem-Subspace Gaussian Mixture Models (SGMM) are successfully able to combine acoustic modeland featurelevel adaptation techniques. More specifically, we focus on under-resourced languages (Afrikaans in our case) and perform feature-level adaptation through the estimation of phone class posterior features with a Multilayer Perceptron that was trained on data from a similar language with large amounts of available speech data (Dutch in our case). The same Dutch data can also be exploited on an acoustic model-level by training globally-shared SGMM parameters in a crosslingual way. The two adaptation techniques are indeed complementary and result in a crosslingual Tandem-SGMM system that yields relative improvement of about 22% compared to a standard speech recognizer on an Afrikaans phoneme recognition task. Interestingly, eventual score-level combination of the individual SGMM systems yields additional 3% relative improvement.
منابع مشابه
Tper Hcaeser Pidi Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios
This paper describes experimental results of applying Subspace Gaussian Mixture Models (SGMMs) in two completely diverse acoustic scenarios: (a) for Large Vocabulary Continuous Speech Recognition (LVCSR) task over (well-resourced) English meeting data and, (b) for acoustic modeling of underresourced Afrikaans telephone data. In both cases, the performance of SGMM models is compared with a conve...
متن کاملFeature and Score Level Combination of Subspace Gaussians in Lvcsr Task
In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outpu...
متن کاملCrosslingual Adaptation of Semi-continuous Hmms Using Acoustic Sub-simplex Projection
With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. Transferring already existing hidden Markov models (HMM) from a source to the target language is seen as a key step to cope with this task. Typically, such a crosslingual model adaptation task consists of a three step procedur...
متن کاملParallel Neural Network Features for Improved Tandem Acoustic Modeling
The combination of acoustic models or features is a standard approach to exploit various knowledge sources. This paper investigates the concatenation of different bottleneck (BN) neural network (NN) outputs for tandem acoustic modeling. Thus, combination of NN features is performed via Gaussian mixture models (GMM). Complementarity between the NN feature representations is attained by using var...
متن کاملRobust Estimation and Adaptation of Subspace Gaussian Mixture Models for Automatic Speech Recognition
In conventional hidden Markov model (HMM) based speech recognisers, the emitting HMM states are modelled by Gaussian Mixture Models (GMMs), with parameters been estimated directly from the training data. However, in Subspace Gaussian mixture model(GMM) based acoustic modelling, the parameters of each state model are derived from the globally shared model subspaces which are normally low dimensi...
متن کامل